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Abstract— This paper is quantitative research to analyze the correlation of the international news sentiment 
on macroeconomic indicators in ASEAN countries. The analyzed news sentiment data was obtained from the 
Global Database on Events, Location, and Tone (GDELT) site and the economic index data was obtained from 
the World Development Indicator (WDI) dataset provided by the World Bank. Machine-learning was used to 
determine the positive or negative correlation and effects of the two variables. A regression analysis was 
performed to find out the form of the forecasting equation of the two variables. The data types of the two 
variables were taken in the form of numerical data (numbers) by following the steps of the Cheat Sheet 
Algorithm from Scikit-Learn. All data analysis methods were performed using Python software. The long-term 
goal to be achieved is to contribute in improving macroeconomic forecasting models and developing predictive 
models with new approaches based on the results of macroeconomic index analysis, and international sentiment 
by combining data from GDELT and WDI. Thus, The particular target of the results of this research is to 
determine whether positive or negative correlation and effects of the international mass media sentiment 
variable on macroeconomic indicators variable in Indonesia, Malaysia, Thailand, Singapore, and Brunei 
Darussalam. Then to find out the form of the regression equation of the two variables that will be useful for the 
media in the countries above in conducting international reporting. This research used a quantitative research 
method by utilizing machine-learning to determine the direction of the research, defining background and goals, 
formulating research questions, and looking for relevant basic arguments for developing hypotheses, initial data 
support, research design, and alternative methodologies. The research plan will start from data collection, data 
preparation and preprocession, data filtering, data deduplication, dataset preparation, data normalization, then 
statistical description, regression analysis, t-test, to reporting research results. 

Keywords : ASEAN, big data, GDELT, international mass media, news sentiment, macroeconomic, world 
development indicators. 





I. INTRODUCTION 
The economic sector is formed based on the behavior of actors from various levels, national, 
companies, investors, and consumers, The majority of them are influenced by past and current economic 
conditions, and their prospects for the future (Haren, 2017). In the past 20 years, the large expansion of 
international trade has been a major driver of economic growth. The increase of integration and interconnection 
of the world economy, global crisis, and new trade flow patterns are the key factors that have changed the 
dynamics of international trade (WTO, 2013). Until now, various forecasting models for economic development 
have been developed. One of them is a proposed model for predicting future trade flows by combining various 
factors such as the geographical location and economic size of a country (Tinbergen 1962; McCallum 1995; 
Anderson & Wincoop 2003; Rose 2004) against international relations concerning conflict and cooperation 
(Pollins 1989; Polachek et al 2007). The use of international media sentiment can provide benefits for 
explaining and predicting macroeconomic variable, such as several studies that are able to prove a significant 
relationship between sentiment and economic activity (Blanchard 1993; Angeletos & La’O 2013; Feasel & 
Kanazawa 2013) and the ability to predict future economic developments (Hwang 2011; Barsky & Sins 2012; 
Juriova 2015). 
Recent literature has found a way to measure sentiment based on text using computational techniques. 
The application of computational text analysis in digital formats enables sentiment analysis with a wider scope. 
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In addition, the results of the analysis were found to be less bias compared to survey-based measures because 
the active participation of respondents were no longer needed. Online news articles were found to be very 
suitable for the application of this automated method considering their important role in sentiment formation 
(Doms & Morin, 2004). On one hand, news articles provide the public with information about economic 
conditions in the form of statistics and expert opinions. On the other hand, the tone of the article, as well as the 
number of articles on a particular news topic, were found to affect the formation of sentiments. A recent study 
from Shapiro, Sudhof, and Wilson (2017) conducted sentiment analysis using computational text analysis and 
found that sentiment can predict future economic results. In addition, their results outperformed traditional 
sentiment measures on the predictions of various economic variables (Shapiro et al, 2017). 

In this study, correlation analysis is used to see the relationship between variables and Regression 
Analysis is used to find out how the form of the equation. The two variables that the relationship will be sought 
are the sentiment of online media reporting whose data is obtained from the Global Database on Events, 
Location, and Tone (GDELT) site and economic index data which obtained from the World Development 
Indicator (WDI) dataset provided by the World Bank. The research period is 16 years, from 2003 to 2019 
(GDELT 2019; World Bank 2018). The economic indexes that become variables in this research are Tax 
Revenue, Export Value, Import Value, GDP Growth, GDP Per Capita, Inflation, Unemployment Figures, 
Foreign Direct Investment Confidence (FDI), and Current Account Balance (CAB). Meanwhile, The News 
Sentiment variable that has gone through the normalization process using the Mean Substraction method is 
named AvgTone_Norm (World Bank, 2018). Image media reports that can be known from the news content 
sentiments towards a country from the GDELT site is processed to find out the average sentiment per year, then 
the correlation is analyzed with macroeconomic indicators. The countries that became the sample of the project 
are countries in the ASEAN region, namely Indonesia, Malaysia, Thailand, Singapore, and Brunei Darussalam. 
The research data analysis was conducted for a period of 16 years, namely from 2003 to 2019. 


II. LITERATUR REVIEW 

1.1 Research State of Art and Roadmap 

The research on the effect of mass media sentiment on the economic sector using the same method has 
been conducted by several experts, such as Haren's research conducted to predict the strength of dyadic 
sentiments and international trade flow, from GDELT data coupled with UN Comtrade. But the focus of the 
research lies in the role of the sentiment of countries in the international trade flow. The countries in question 
amounted to 40 countries from Asia, America and Europe from 1995 to 2015 using the same machine-learning 
(Haren, 2017). Meanwhile, Lammers' research focuses on forecasting population migration by analyzing the 
relationship between tone and migration flow whose data are taken from GDELT and OECD, so that it can 
produce a migration model that can help the government predict the impact of certain migration policies 
(Lammers, 2017). 

Another research conducted by Wang focused on three issues related to events from news articles, 
namely analysis of performance and challenges in the current large-scale event coding system, detection of 
event and extraction of critical information from news articles, and efforts concentrated on event coding that has 
the purpose to filter broad events and arguments from news article texts, but are not focused on their effects on 
macroeconomics (Wang, 2017). The research conducted by Dario Buono et al, presents several types of big data 
including GDELT which can be used for nowcasting macroeconomics by reviewing sources from big data, 
availability, special characteristics, and literature used, so as to identify the types of big data that can be adopted 
in real applications, but does not significantly mention the countries that are the objects and the time range of 
data collection (Buono et al, 2019). 

Embun Purwanta et al, investigated the relationship between macroeconomic variables and the stock 
price indexes of five ASEAN countries (Indonesia, Malaysia, Singapore, Thailand, and the Philippines) from 
2006 to 2015. But data retrieval does not use machine-learning and sources of big data (Prowanta et al, 2017). 
Whereas in this research, the focus of the research lies in the analysis of the effects of international mass media 
sentiment on the macroeconomic index of ASEAN countries (Indonesia, Malaysia, Thailand, Singapore, and 
Brunei Darussalam). International news sentiment variable data is taken through the normalization process using 
the mean substraction method, namely AvgTone_Norm from GDELT using machine-learning from 2003-2019. 
Then the macroeconomic variable indicators taken from WDI published by the World Bank, namely Tax 
Revenue, Export Value, Import Value, GDP Growth, GDP Per Capita, Inflation, Unemployment, Foreign Direct 
Investment Confidence (FDI), and Current Account Balance (CAB), so the forecasting results obtained are far 
more comprehensive to predict the effect of these two variables. 

Machine-learning in Figure 1 in this research is used to find the right estimator for the types of data and 
problems related to macroeconomic sentiment and index, as described in full in the flowchart of Scikit-learn 
below as a rough guide and research roadmap (scikit-learn.org). 
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Source: https://scikit-learn.org 
FIGURE 1. Research Roadmap 


1.2 Theoretical Basis 

The outline of the remainder of this research is providing background information related to 
international news sentiment and macroeconomic indicators. Describing the experimental setup and elaborates 
on the data preprocessing, the construction of the dataset and gives an overview of the algorithms that will be 
used. Presenting the results of the conducted experiments. Providing a general discussion and Discusing the 
limitations of this work and the recommendations for future research. Finally, the last step gives the conclusion 
of this work by reflecting on the research questions. This research requires supporting theories both basic and 
applied theories as well as clear concepts to formulate definitively the relationship between the two variables to 
be examined using machine-learning. 


1.2.1 The Concept of Sentimen 

Sentiment can be described as a public evaluation of current conditions and future prospects. There is a 
growing interest in the use of sentiment in analyzing, explaining, and predicting economic phenomena. To direct 
the level of accuracy in predicting macroeconomic variables, predictive models increasingly depend on 
objective and subjective variables. Literally, there is an agreement about the relationship between sentiment and 
future economic activity, albeit with a different explanation. Blanchard (1993), and Angeletos and La'O (2013) 
use the theory of "animal spirits" to explain that economic activity is driven by sentiment (Blanchard 1993; 
Angeletos & La’O 2013). On the other hand, Barsky and Sims (2012) shows that the correlation between 
sentiment and related economic activity in the future is due to the information aspect of the sentiment itself 
(Barsky & Sims, 2012) 


1.2.2 The Measurement of Sentiment 

Traditionally, the measurement of sentiment is usually based on survey-based index. This sentiment 
index is built from answers given by respondents to questions about current economic conditions and future 
prospects. Although its acceptance and application are broad, there has been some discussion about its 
usefulness in economic forecasting, and the reliability and validity of measurements (Garner 1991; Kellstedt et 
al 2015). The latest method is done by computational techniques to extract sentiment from digital sources such 
as online newspaper articles. The advantages of text-based economic actions are low costs and has a broad 
coverage (Fraiberger, 2016). 


1.2.3 The Relationship of Sentiment with Mass Media News 

News articles are able to create certain opinion sentiments through evaluation of news topics by 
reported actors, implicit or explicit judgments (Noelle-Neuman & Mathes, 1987). Online news articles were 
found to be very suitable for the application of this automated method considering its important role in 
sentiment formation (Doms & Morin, 2004). On the one hand, news articles give the public information about 
economic conditions in the form of statistics and expert opinions. On the other hand, the tone of the article as 
well as the number of articles on a particular news topic were found to influence the formation of sentiments. 
One finding is that the role of the media largely influences the formation of sentiment because people renew 
their beliefs according to the content and volume in news articles. In addition, text-based sentiment measures 
were found to be faster in responding to news events compared to measurements based on survey responses 
(O’Conner, 2014). 
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1.2.4 The Relationship of Sentiment with Economic Variables 

Several studies have shown the relationship of sentiment on the macroeconomic variables and their 
predictive power. Barsky and Sims (2012) stated that sentiment can convey information that cannot be observed 
in objective economic variables (Barsky & Sims, 2012). According to Juriova (2015), sentiments towards 
foreign trading partners can be used to explain the real fluctuations in Gross Domestic Product (GDP), consumer 
prices and exchange rates. Meanwhile, Li and Makino (2015) found that positive sentiment towards foreign 
countries facilitated foreign direct investment (FDI), while negative sentiment led to lower investment. They 
also stated that the effects of negative sentiment had a stronger impact compared to positive sentiment (Juriova 
2015; Li & Makino 2015) 


1.2.5 The Strength of Forecasting Sentiment in Economic Activity 

In his research, Fraiberger (2016) found that sentiment extracted from economic news articles tracks 
fluctuations in GDP growth, and is a key indicator for GDP growth at the country level. According to Shapiro, 
Sudhof, and Wilson (2017) show that news sentiment has the strength to predict future economic activity 
(Shapiro et al 2017 & Fraiberger 2016). Gerrish and Blei (2011), assumed that the strength of sentiment 
prediction for macroeconomic variables of two countries is reflected in the news articles mentioned together 
(Gerish & Blei, 2011). 


HMI. METHODOLOGY 
This research will be conducted through the following stages: 
Literature study with relevant research themes and methods 
Designing algorithms needed in Python for the data collection process 
The trial process of the algorithm that has been made 
The process of analyzing numerical text data by exploring statistics and describing statistics 
The process of regression analysis of the data that has been fixed to determine the correlation and 
influence of one another 
The process of testing data by t-test 


eae ge 


i 


The subject in this research is machine-learning that is used by the author based on a series of cheat 
sheet algorithms with the help of Python software as an analysis tool. While the object of this research is 
international media mass sentiment and macroeconomic indicators in Indonesia, Malaysia, Thailand, Singapore, 
and Brunei Darussalam. Sentiment data of international online media reports was obtained from the GDELT site 
by conducting a preparation and preprocessing process before being processed statistically with data on the 
economic indexes of the five ASEAN countries obtained from World Development Indicators (WDI) of the 
World Bank. The data used in this research were taken over a span of 16 years, from 2003 to 2019. 

Data analysis is a process in a series of studies conducted to solve the problem under study. The 
accuracy in the use of research analysis tools largely determines the accuracy in taking hypotheses. The analysis 
of this research belongs to the type of inferential-correlational quantitative analysis, which is a statistical 
analysis that seeks to find relationships or influences between two or more variables consisting of independent 
variables and dependent variables. Table 1 explains the types of correlational analysis seen from the scale of the 
research data used (Neter & Kutner, 1983). 


TABLE 1. Types of correlational analysis with data scale 
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Source: Python processed data 
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Descriptive analysis is used to understand data reflection of GDELT to real-world event, then conducted 
descriptive anslysis by visualizing the simple count of news. The datasets provide near-realtime insights into 
what is happening in different places in five ASEAN countries. There are several important things, especially in 
GDELT data that the author has confirmed as well as several other research that revealed the same problem. 
After all raw GDELT data stored in MySQL is cleansed and filtered, then proceed by querying SQL to 
recapitulate data annually and subsequently stored in a dataset to facilitate further analysis. The dataset is then 
combined with indicator data on the WDI into a Subset and stored in file.csv to simplify the analysis of the data. 
As explained earlier, there is an issue regarding the difference in the AvgTone calculation range in GDELT data 
for the news of 2013 and below and for the news of 2013 and above where the average AvgTone values for 
2013 and above are calculated lower than in 2013 and below. As the last step, this research combined the 
information from GDELT (discussed about international news sentiment) and WDI (World Development 
Indicators) from World Bank data sets on macroeconomic indicators of five ASEAN Countries in the period of 
2002-2019. 


IV. RESULTS AND DISCUSSION 

4.1 Correlation Analysis 

The linear regression model analysis the relationship between a response variable and one / more 
predictor variables. The model parameters are estimated from the data and used to model the relationship as a 
linear function. This is done by fitting the model coefficients to the data using the least squares approach to 
minimize the residual sum of squared errors (Haren, 2017). Correlation analysis is used to see the togetherness 
between the news sentiment variable and economic indicator variable. In countries where togetherness is not 
found (or the correlation value approaches and equal to 0), it will not be analyzed further because it means there 
is no relationship or correlation between these variables. The range of values in this analysis is -1 to 1. 
Correlation analysis used is Pearson Correlation. The sns.pairplot function (data.hue = ‘Country’) in the Pandas 
library is used to display the scatterplot shape results and the calculation of correlation values, with the output 
shown in the figure. 


Source: Python processed data 
FIGURE 2. The plot shows the relationship between variables 


The numbers in the table show the correlation values between variables. From the above values, it can be 
concluded that each economic index variable has a correlation to the news sentiment variable. Negative 
correlations are shown by Tax Rev, GDP Capita, Inflation, and Unemployment Rate. While the positive 
correlation as shown in the figure, in sequence from the most correlated to the news sentiment variable are 
CAB, FDI, GDP Growth, Import Value, and Export Value. Of the nine economic index variables above, the 
most correlated with the news sentiment variable are CAB (0.777628), FDI (0.727840), and inflation (- 
0.223941). The following is a machine-learning model of the three economic index variables compared with 
news sentiment (AvgTone_Norm). Statistical regression calculations for variables that have a significant 
correlation: AvgTone-CAB, AvgTone-FDI and AvgTone-Inflation in the following table: 
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TABLE 6. Statistical regression 











Coeffisients AvgTone-CAB AvgTone-FDI AvgTone-Inflation 
a 10.62296577 8.89958287 3.20713538 
b 12.33578554 10.89407501 -2.56330405 




















Source: Python processed data 


Based on the table above, the regression formula can be written as follows: 


AvgTone - CAB : y = 10.63 + 12.33x 
AvegTone - Inflation : y = 3.27 - 2.56x 
AvgTone - FDI : y = 8.89 + 10.89x 


This research analysed the effect of international news sentiment on macroeconomic indicators in five 
ASEAN countries. News articles are able to create certain opinion sentiments through evaluation of news topics 
by reported actors, implicit or explicit judgments (Noelle-Neuman & Mathes, 1987). News articles can give the 
public information about economic conditions in the form of statistics and expert opinions. Several studies have 
shown the relationship of sentiment on the macroeconomic variables and their predictive power but not specific 
to one particular indicator of macroeconomic. Barsky and Sims (2012) stated that sentiment can convey 
information that cannot be observed in objective economic variables (Barsky & Sims, 2012). According to 
Juriova (2015), sentiments towards foreign trading partners can be used to explain the real fluctuations in Gross 
Domestic Product (GDP), consumer prices and exchange rates. Li and Makino (2015) found that positive 
sentiment towards foreign countries facilitated foreign direct investment (FDI), while negative sentiment led to 
lower investment. They also stated that the effects of negative sentiment had a stronger impact compared to 
positive sentiment (Juriova 2015; Li & Makino 2015). Fraiberger (2016) found that sentiment extracted from 
economic news articles tracks fluctuations in GDP growth, and is a key indicator for GDP growth at the country 
level. According to Shapiro, Sudhof, and Wilson (2017) show that international news sentiment has the strength 
to predict future economic activity (Shapiro et al 2017 & Fraiberger 2016). Gerrish and Blei (2011), assumed 
that the strength of sentiment prediction for macroeconomic variables of two countries is reflected in the news 
articles mentioned together (Gerish & Blei, 2011). 

Meanwhile, in line with the concept above, this research concluded that each economic index variable 
has a correlation to the news sentiment variable. Negative correlations are shown by Tax Rev, GDP Capita, 
Inflation, and Unemployment Rate. While, the positive correlation as shown in the figure, in sequence from the 
most correlated to the news sentiment variable are CAB, FDI, GDP Growth, Import Value, and Export Value. 
This research found that the most correlated with the news sentiment variable are CAB (0.777628), FDI 
(0.727840), and inflation (-0.223941). The following is a machine-learning model of the three economic index 
variables compared with news sentiment (AvgTone_Norm). Statistical regression calculation is performed for 
variables that have a significant correlation to the news sentiment: CAB (AvgTone-CAB), FDI (AvgTone-FDI) 
and Inflation (AvgTone-Inflation). 


4.2 T-Test (Model Feasibility Test) 
a. One-Sample T-Test 

Testing using the T-Test is used to evaluate the null hypothesis where the average sample dataset is equal 
to the population from which the sample dataset is. The results of the AvgTone_Norm T-Test can be seen in the 
appendix. 





import scipy 

true_mu = @ 

onesample_ results = scipy.stats.ttest_lsamp(a,true_mu) 
print(str(onesample_results)) 


Ttest_lsampResult (statistic=2.2448584442337268e-15, pvalue=@.9999999999999982) 
Source: Python processed data 
FIGURE 3. The result of One-Sample T-test of AvgTone_Norm variable 





The results of the AvgTone_Norm T-Test can be concluded that the p-value is greater than the t-statistic, 
so it does not reject the null hypothesis (H-0) at the significant level of 0.05. 
b. Two Sample T-Test 

The test involves 2 separate datasets on the same variant to investigate whether the average of the two are 
identical, if taken from the same population. 
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import scipy 


a = data[ ‘AvgTone_Norm' ]. values 
b = data['FDI"].values 


twosample_ results = scipy.stats.ttest_ind(a, b) 
print(str(twosample_results)) 





Ttest_indResult(statistic=-6.7856890655659345, pvalue=2.594559996004258e-10) 


Source: Python processed data 
FIGURE 4. The result of Two Sample T-test of AvgTone_Norm and FDI variables 


Wherefore the p-value is much smaller than statistics test, it is proven to reject the null hypothesis (H- 
0) about identical assumptions. Economic policymakers (governments of each country) and market participants 
rely on a broad array of models that incorporate soft information. As opposed to hard information which 
includes objective and directly quantifiable variables such as production and employment, soft information 
includes subjective measures concerning attitudes about current and future economic conditions (Shapiro et al, 
2017). In this section, we aggregate the news article sentiment scores into macroeconomic indexes and assess its 
correlation with these survey measures. A strong correlation would help validate that our news sentiment 
measure is not pure noise and is capturing similar information to that of the surveys. 


vV. CONCLUSION 

Analysis and testing of the correlation as well as the influence (significance) of international media 
news sentiment from GDELT site on macroeconomic indicators of countries in ASEAN obtained various results 
both negative and positive, by using a machine-learning model, namely Multiple Linear Regression Analysis 
accompanied by the selection of methods based on the alleged existence of a link between the independent 
variable and the dependent variable which is linear. So that, the results obtained are international mass media 
sentiment that have been normalized using the Mean Substraction method named AvgTone_Norm known to be 
able to give positive and negative signals to the macroeconomic conditions of countries in ASEAN which is 
obtained from the measurement of various indicators, namely: Tax Revenue, Current Account Balance (CAB), 
GDP Capita, GDP Growth, Inflation, Unemployment, Foreign Direct Investment (FDI), Export and Import. As 
well as the international news sentiment variable is related (correlated) to the macroeconomic index of countries 
in ASEAN which is indicated by the existence of a positive correlation and negative correlation. Positive 
correlation is shown by the variables of CAB, FDI, GDP Growth, Import Value, and Export Value to the 
international news sentiment. Meanwhile, the negative correlation is shown by the variables of Tax Revenue, 
GDP Capita, Inflation, and Unemployment Value to the international news sentiment. From the table above, it is 
known that not all of these indicators can reflect the macroeconomic conditions of the ASEAN countries. Of the 
9 economic index variables, there are 3 that have the most correlation with news sentiment variable, namely 
CAB (0.777628), FDI (0.727840), and Inflation (- 0.223941). 

Then after the two variables (dependent and independent) are analyzed the correlation and its influence, 
then testing the regression coefficient or t-test which is intended to test the parameters (regression coefficients 
and constants) which are thought to estimate the multiple linear regression model is an appropriate parameter or 
not. Then the two steps of t-test are conducted, namely One-Sample t-test and two sample t-test. The result of 
One-Sample t-test from the results of AvgTone_Norm, it can be concluded that the p-value is greater than the t- 
statistic that is 0.9999, then it does not reject the null hypothesis (H-0) at the 0.05 significance level. Meanwhile, 
the result of the two sample t-test shows that the p-value is much smaller than the t-statistic value of 2.5945599 - 
10, so that it is proven to be able to reject the null hypothesis (H-0) about identical assumptions. 
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